By using content loss to different layers, we can see the reconstructed image varies. When the applied layer is deeper, the detail of the reconstructed image has more noise.
Layer optimized content loss | Reconstructed Image | Detail |
---|---|---|
conv1 | ||
conv4 | ||
conv7 | ||
conv10 | ||
conv13 |
Layer | Noise1 | Noise2 | Reconstruction from noise1 | Reconstruction from noise2 |
---|---|---|---|---|
conv1 | ||||
conv4 | ||||
conv7 | ||||
conv10 | ||||
conv13 |
As we can see, when the content loss is optimized at deeper layers, the synthesized texture is more abstract and has more similar shapes but less similar colors. From my point of view, the combination of conv1
, conv3
, conv5
, conv7
, and conv9
looks the best. Therefore, I am going to choose this configuration for my style loss.
Layer optimized content loss | Synthesized texture |
---|---|
conv1, conv2, conv3, conv4, conv5 | |
conv1, conv3, conv5, conv7, conv9 | |
conv6, conv7, conv8, conv9, conv10 | |
conv11, conv12, conv13, conv14, conv15 |
Layer | Noise1 | Noise2 | Synthesized texture from noise1 | Synthesized texturefrom noise2 |
---|---|---|---|---|
conv1, conv2, conv3, conv4, conv5 | ||||
conv1, conv3, conv5, conv7, conv9 | ||||
conv6, conv7, conv8, conv9, conv10 | ||||
conv11, conv12, conv13, conv14, conv15 |
####
From the result, the performance differs with different combinations of content image and style image. But mostly 100000
and 1000000
work the best.
By comparison we can see the output images from noise are more texture-liked. There are elements from original content image maintained, but still very vague.
From noise | From content |
---|---|
Generally speaking, generating the style-transferred image from content image is approx. a quarter faster than from noise.
I took this photo in the Versailles Palace in Paris last December. Very impressive and I really recommend everyone to sightsee there :)
This photo is also taken by me! I was standing on the left bank of the Seine river, in front of the Musée d'Orsay in Paris. The style image, "Starry Night Over the Rhone", is one of Van Gogh's collections in Musée d'Orsay. When I saw the Seine river after visiting the museum, I immediately recalled this painting.
This style image is painted by Klimt. He is one of my favorite artists, I really recommend everyone to check his collections out.
I, again, became the sun.
###
The basic idea is to keep the content image the original size. Therefore, the first step we are going to do is to check if the style image is large enough. If it is, we can directly jump to cropping. (If the style image is way too large, we can also indicate an upper limit of the ratio between the size of the style image and the content image, and then resize the style image to a reasonably smaller scale)
Then, for those style images which are not large enough, we are going to check the ratio of the width of the style image with the width of the content image, as well as the ratio of lengths of them.
When one of the ratios >=1
, we can consider that edge is long enough, and we will expand the style image with the ratio of another edge.
When both of the ratios <1
, we will take the edge with the smaller ratio as the divider to expand the style image. When doing this, the other edge is automatically expanded to a valid length at the same time.
xwidth_ratio = style_width / content_width
height_ratio = style_height / content_height
style_ratio = style_height / style_width
# Step 1
if width_ratio >= 1 and height_ratio >= 1:
pass
# Step 2
elif width_ratio < 1 and height_ratio >= 1:
new_width = content_width
new_height = int(new_width * style_ratio) + 1
style_img = F.resize(style_img, (new_height, new_width))
# Step 3
elif width_ratio >= 1 and height_ratio < 1:
new_height = content_height
new_width = int(new_height / style_ratio) + 1
style_img = F.resize(style_img, (new_height, new_width))
else:
ratio = min(width_ratio, height_ratio)
new_height = int(style_height / ratio) + 1
new_width = int(style_width / ratio) + 1
style_img = F.resize(style_img, (new_height, new_width))
Finally, we can crop the image. We could start from the top left, or we can indicate any valid start point that is not out of range.
xxxxxxxxxx
top, left, height, width = 0, 0, content_height, content_width
We could also use random crop to get more diverse results, for that synthesized image varies according to different style image input.
xxxxxxxxxx
from torchvision.transforms import RandomCrop
top, left, height, width = RandomCrop.get_params(style_img, (content_height, content_width))
Then, we manipulate the style image:
xxxxxxxxxx
style_img = F.crop(style_img, top=top, left=left, height=height, width=width)
###